City of Caloocan
Fine-Tuning Pre-trained Language Models to Detect In-Game Trash Talks
Fesalbon, Daniel, De La Cruz, Arvin, Mallari, Marvin, Rodelas, Nelson
Common problems in playing online mobile and computer games were related to toxic behavior and abusive communication among players. Based on different reports and studies, the study also discusses the impact of online hate speech and toxicity on players' in-game performance and overall well-being. This study investigates the capability of pre-trained language models to classify or detect trash talk or toxic in-game messages The study employs and evaluates the performance of pre-trained BERT and GPT language models in detecting toxicity within in-game chats. Using publicly available APIs, in-game chat data from DOTA 2 game matches were collected, processed, reviewed, and labeled as non-toxic, mild (toxicity), and toxic. The study was able to collect around two thousand in-game chats to train and test BERT (Base-uncased), BERT (Large-uncased), and GPT-3 models. Based on the three models' state-of-the-art performance, this study concludes pre-trained language models' promising potential for addressing online hate speech and in-game insulting trash talk.
- Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
- Asia > Singapore (0.05)
- Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
- Asia > Philippines > Luzon > National Capital Region > City of Caloocan (0.04)
GULP: Solar-Powered Smart Garbage Segregation Bins with SMS Notification and Machine Learning Image Processing
Sigongan, Jerome B., Sinodlay, Hamer P., Cuizon, Shahida Xerxy P., Redondo, Joanna S., Macapulay, Maricel G., Bulahan-Undag, Charlene O., Gumonan, Kenn Migan Vincent C.
This study intends to build a smartbin that segregates solid waste into its respective bins. To make the waste management process more interesting for the end-users; to notify the utility staff when the smart bin needs to be unloaded; to encourage an environment-friendly smart bin by utilizing renewable solar energy source. The researchers employed an Agile Development approach because it enables teams to manage their workloads successfully and create the highest-quality product while staying within their allocated budget. The six fundamental phases are planning, design, development, test, release, and feedback. The Overall quality testing result that was provided through the ISO/IEC 25010 evaluation which concludes a positive outcome. The overall average was 4.55, which is verbally interpreted as excellent. Additionally, the application can also independently run with its solar energy source. Users were able to enjoy the whole process of waste disposal through its interesting mechanisms. Based on the findings, a compressor is recommended to compress the trash when the trash level reaches its maximum point to create more rooms for more garbage. An algorithm to determine multiple garbage at a time is also recommended. Adding a solar tracker coupled with solar panel will help produce more renewable energy for the smart bin.
- Asia > Malaysia (0.04)
- Asia > India > Maharashtra > Pune (0.04)
- Asia > Vietnam (0.04)
- (8 more...)
- Water & Waste Management > Solid Waste Management (1.00)
- Energy > Renewable > Solar (1.00)
Performance Evaluation of Regression Models in Predicting the Cost of Medical Insurance
Cenita, Jonelle Angelo S., Asuncion, Paul Richie F., Victoriano, Jayson M.
The study aimed to evaluate the regression models' performance in predicting the cost of medical insurance. The Three (3) Regression Models in Machine Learning namely Linear Regression, Gradient Boosting, and Support Vector Machine were used. The performance will be evaluated using the metrics RMSE (Root Mean Square), r2 (R Square), and K-Fold Cross-validation. The study also sought to pinpoint the feature that would be most important in predicting the cost of medical insurance.The study is anchored on the knowledge discovery in databases (KDD) process. (KDD) process refers to the overall process of discovering useful knowledge from data. It show the performance evaluation results reveal that among the three (3) Regression models, Gradient boosting received the highest r2 (R Square) 0.892 and the lowest RMSE (Root Mean Square) 1336.594. Furthermore, the 10-Fold Cross-validation weighted mean findings are not significantly different from the r2 (R Square) results of the three (3) regression models. In addition, Exploratory Data Analysis (EDA) using a box plot of descriptive statistics observed that in the charges and smoker features the median of one group lies outside of the box of the other group, so there is a difference between the two groups. It concludes that Gradient boosting appears to perform better among the three (3) regression models. K-Fold Cross-Validation concluded that the three (3) regression models are good. Moreover, Exploratory Data Analysis (EDA) using a box plot of descriptive statistics ceases that the highest charges are due to the smoker feature.
- South America > Paraguay > Asunción > Asunción (0.05)
- Asia > Middle East > Jordan (0.04)
- Asia > Philippines > Luzon > National Capital Region > City of Manila (0.04)
- (3 more...)
- Banking & Finance > Insurance (1.00)
- Health & Medicine > Health Care Providers & Services > Reimbursement (0.86)
Smart Metro: Deep Learning Approaches to Forecasting the MRT Line 3 Ridership
Empino, Jayrald, Junsay, Jean Allyson, Verzon, Mary Grace, Abisado, Mideth, Huyo-a, Shekinah Lor, Sampedro, Gabriel Avelino
Since its establishment in 1999, the Metro Rail Transit Line 3 (MRT3) has served as a transportation option for numerous passengers in Metro Manila, Philippines. The Philippine government's transportation department records more than a thousand people using the MRT3 daily and forecasting the daily passenger count may be rather challenging. The MRT3's daily ridership fluctuates owing to variables such as holidays, working days, and other unexpected issues. Commuters do not know how many other commuters are on their route on a given day, which may hinder their ability to plan an efficient itinerary. Currently, the DOTr depends on spreadsheets containing historical data, which might be challenging to examine. This study presents a time series prediction of daily traffic to anticipate future attendance at a particular station on specific days.
- Asia > Philippines > Luzon > National Capital Region > City of Manila (0.58)
- Asia > Singapore (0.05)
- North America > United States (0.04)
- (7 more...)
A Predictive Model using Machine Learning Algorithm in Identifying Students Probability on Passing Semestral Course
This study aims to determine a predictive model to learn students probability to pass their courses taken at the earliest stage of the semester. To successfully discover a good predictive model with high acceptability, accurate, and precision rate which delivers a useful outcome for decision making in education systems, in improving the processes of conveying knowledge and uplifting students academic performance, the proponent applies and strictly followed the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. This study employs classification for data mining techniques, and decision tree for algorithm. With the utilization of the newly discovered predictive model, the prediction of students probabilities to pass the current courses they take gives 0.7619 accuracy, 0.8333 precision, 0.8823 recall, and 0.8571 f1 score, which shows that the model used in the prediction is reliable, accurate, and recommendable. Considering the indicators and the results, it can be noted that the prediction model used in this study is highly acceptable. The data mining techniques provides effective and efficient innovative tools in analyzing and predicting student performances. The model used in this study will greatly affect the way educators understand and identify the weakness of their students in the class, the way they improved the effectiveness of their learning processes gearing to their students, bring down academic failure rates, and help institution administrators modify their learning system outcomes. Further study for the inclusion of some students demographic information, vast amount of data within the dataset, automated and manual process of predictive criteria indicators where the students can regulate to which criteria, they must improve more for them to pass their courses taken at the end of the semester as early as midterm period are highly needed.
- Asia > Philippines > Luzon > Calabarzon > Province of Cavite (0.14)
- Asia > Philippines > Luzon > National Capital Region > City of Manila (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- Materials > Metals & Mining (1.00)
- Education > Educational Setting (1.00)
- Education > Assessment & Standards > Student Performance (1.00)
Generate, Filter, and Rank: Grammaticality Classification for Production-Ready NLG Systems
Challa, Ashwini, Upasani, Kartikeya, Balakrishnan, Anusha, Subba, Rajen
Neural approaches to Natural Language Generation (NLG) have been promising for goal-oriented dialogue. One of the challenges of productionizing these approaches, however, is the ability to control response quality, and ensure that generated responses are acceptable. We propose the use of a generate, filter, and rank framework, in which candidate responses are first filtered to eliminate unacceptable responses, and then ranked to select the best response. While acceptability includes grammatical correctness and semantic correctness, we focus only on grammaticality classification in this paper, and show that existing datasets for grammatical error correction don't correctly capture the distribution of errors that data-driven generators are likely to make. We release a grammatical classification and semantic correctness classification dataset for the weather domain that consists of responses generated by 3 data-driven NLG systems. We then explore two supervised learning approaches (CNNs and GBDTs) for classifying grammaticality. Our experiments show that grammaticality classification is very sensitive to the distribution of errors in the data, and that these distributions vary significantly with both the source of the response as well as the domain. We show that it's possible to achieve high precision with reasonable recall on our dataset.
- North America > United States > New Jersey > Ocean County (0.04)
- Europe > France > Bourgogne-Franche-Comté > Doubs > Besançon (0.04)
- Asia > Philippines > Luzon > National Capital Region > City of Caloocan (0.04)
- (6 more...)